WSE, a new sequence distance measure based on word frequencies.
Identifieur interne : 002F72 ( Main/Exploration ); précédent : 002F71; suivant : 002F73WSE, a new sequence distance measure based on word frequencies.
Auteurs : Jun Wang [République populaire de Chine] ; Xiaoqi ZhengSource :
- Mathematical biosciences [ 0025-5564 ] ; 2008.
Descripteurs français
- KwdFr :
- MESH :
English descriptors
- KwdEn :
- MESH :
- chemical , genetics : DNA, Viral.
- classification : SARS Virus, Viruses.
- genetics : SARS Virus, Viruses.
- Amino Acid Sequence, Base Sequence, Evolution, Molecular, Mathematics, Models, Genetic, Phylogeny.
Abstract
In this article, we present a new distance metric, the Weighted Sequence Entropy (WSE), based on the short word composition of biological sequences. As a revision of the classical relative entropy (RE), our metric (1) works equivalently with RE in the case of small k, (2) avoids the degeneracy when some word types are absent in one sequence but not in the other. Experiments on 25 viruses including SARS-CoVs show that our method and RE give exactly the same phylogenetic tree when word length k 3, our method still works and gets convergent phylogenetic topology but the RE gives degenerate results.
DOI: 10.1016/j.mbs.2008.06.001
PubMed: 18590747
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PubMed, to step Corpus: 001B11
- to stream PubMed, to step Curation: 001B11
- to stream PubMed, to step Checkpoint: 001970
- to stream Ncbi, to step Merge: 001C89
- to stream Ncbi, to step Curation: 001C89
- to stream Ncbi, to step Checkpoint: 001C89
- to stream Main, to step Merge: 003036
- to stream Main, to step Curation: 002F72
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">WSE, a new sequence distance measure based on word frequencies.</title>
<author><name sortKey="Wang, Jun" sort="Wang, Jun" uniqKey="Wang J" first="Jun" last="Wang">Jun Wang</name>
<affiliation wicri:level="1"><nlm:affiliation>Department of Applied Mathematics, Dalian University of Technology, Dalian 116024, PR China. junwang@dlut.edu.cn</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Applied Mathematics, Dalian University of Technology, Dalian 116024</wicri:regionArea>
<wicri:noRegion>Dalian 116024</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Zheng, Xiaoqi" sort="Zheng, Xiaoqi" uniqKey="Zheng X" first="Xiaoqi" last="Zheng">Xiaoqi Zheng</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2008">2008</date>
<idno type="RBID">pubmed:18590747</idno>
<idno type="pmid">18590747</idno>
<idno type="doi">10.1016/j.mbs.2008.06.001</idno>
<idno type="wicri:Area/PubMed/Corpus">001B11</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001B11</idno>
<idno type="wicri:Area/PubMed/Curation">001B11</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001B11</idno>
<idno type="wicri:Area/PubMed/Checkpoint">001970</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">001970</idno>
<idno type="wicri:Area/Ncbi/Merge">001C89</idno>
<idno type="wicri:Area/Ncbi/Curation">001C89</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">001C89</idno>
<idno type="wicri:doubleKey">0025-5564:2008:Wang J:wse:a:new</idno>
<idno type="wicri:Area/Main/Merge">003036</idno>
<idno type="wicri:Area/Main/Curation">002F72</idno>
<idno type="wicri:Area/Main/Exploration">002F72</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">WSE, a new sequence distance measure based on word frequencies.</title>
<author><name sortKey="Wang, Jun" sort="Wang, Jun" uniqKey="Wang J" first="Jun" last="Wang">Jun Wang</name>
<affiliation wicri:level="1"><nlm:affiliation>Department of Applied Mathematics, Dalian University of Technology, Dalian 116024, PR China. junwang@dlut.edu.cn</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Applied Mathematics, Dalian University of Technology, Dalian 116024</wicri:regionArea>
<wicri:noRegion>Dalian 116024</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Zheng, Xiaoqi" sort="Zheng, Xiaoqi" uniqKey="Zheng X" first="Xiaoqi" last="Zheng">Xiaoqi Zheng</name>
</author>
</analytic>
<series><title level="j">Mathematical biosciences</title>
<idno type="ISSN">0025-5564</idno>
<imprint><date when="2008" type="published">2008</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Amino Acid Sequence</term>
<term>Base Sequence</term>
<term>DNA, Viral (genetics)</term>
<term>Evolution, Molecular</term>
<term>Mathematics</term>
<term>Models, Genetic</term>
<term>Phylogeny</term>
<term>SARS Virus (classification)</term>
<term>SARS Virus (genetics)</term>
<term>Viruses (classification)</term>
<term>Viruses (genetics)</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr"><term>ADN viral (génétique)</term>
<term>Mathématiques</term>
<term>Modèles génétiques</term>
<term>Phylogénie</term>
<term>Séquence d'acides aminés</term>
<term>Séquence nucléotidique</term>
<term>Virus ()</term>
<term>Virus (génétique)</term>
<term>Virus du SRAS ()</term>
<term>Virus du SRAS (génétique)</term>
<term>Évolution moléculaire</term>
</keywords>
<keywords scheme="MESH" type="chemical" qualifier="genetics" xml:lang="en"><term>DNA, Viral</term>
</keywords>
<keywords scheme="MESH" qualifier="classification" xml:lang="en"><term>SARS Virus</term>
<term>Viruses</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en"><term>SARS Virus</term>
<term>Viruses</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr"><term>ADN viral</term>
<term>Virus</term>
<term>Virus du SRAS</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Amino Acid Sequence</term>
<term>Base Sequence</term>
<term>Evolution, Molecular</term>
<term>Mathematics</term>
<term>Models, Genetic</term>
<term>Phylogeny</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr"><term>Mathématiques</term>
<term>Modèles génétiques</term>
<term>Phylogénie</term>
<term>Séquence d'acides aminés</term>
<term>Séquence nucléotidique</term>
<term>Virus</term>
<term>Virus du SRAS</term>
<term>Évolution moléculaire</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">In this article, we present a new distance metric, the Weighted Sequence Entropy (WSE), based on the short word composition of biological sequences. As a revision of the classical relative entropy (RE), our metric (1) works equivalently with RE in the case of small k, (2) avoids the degeneracy when some word types are absent in one sequence but not in the other. Experiments on 25 viruses including SARS-CoVs show that our method and RE give exactly the same phylogenetic tree when word length k 3, our method still works and gets convergent phylogenetic topology but the RE gives degenerate results.</div>
</front>
</TEI>
<affiliations><list><country><li>République populaire de Chine</li>
</country>
</list>
<tree><noCountry><name sortKey="Zheng, Xiaoqi" sort="Zheng, Xiaoqi" uniqKey="Zheng X" first="Xiaoqi" last="Zheng">Xiaoqi Zheng</name>
</noCountry>
<country name="République populaire de Chine"><noRegion><name sortKey="Wang, Jun" sort="Wang, Jun" uniqKey="Wang J" first="Jun" last="Wang">Jun Wang</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/SrasV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002F72 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 002F72 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= SrasV1 |flux= Main |étape= Exploration |type= RBID |clé= pubmed:18590747 |texte= WSE, a new sequence distance measure based on word frequencies. }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i -Sk "pubmed:18590747" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd \ | NlmPubMed2Wicri -a SrasV1
This area was generated with Dilib version V0.6.33. |